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ABSTRACT 

We compute the constraints on a "standard" 10 parameter cold dark matter (CDM) model from the 
most recent CMB data and other observations, exploring 30 million discrete models and two continuous 
parameters. Our parameters are the densities of CDM, baryons, neutrinos, vacuum energy and curvature, 
the reionization optical depth, and the normalization and tilt for both scalar and tensor fluctuations. 
Our strongest constraints are on spatial curvature, —0.24 < 17k < 0.38, and CDM density, /i 2 f2 c dm < 0.3, 
both at 95%. Including SN la constraints gives a positive cosmological constant at high significance. We 
explore the robustness of our results to various assumptions. We find that three different data subsets 
give qualitatively consistent constraints. Some of the technical issues that have the largest impact are 
the inclusion of calibration errors, closed models, gravity waves, reionization, nucleosynthesis constraints 
and 10-dimensional likelihood interpolation. 

Subject headings: cosmic microwave background — methods: data analysis 



1. INTRODUCTION 

The past year has yet again seen dramatically im- 
proved measurements of the Cosmic Microwave Back- 
ground (CMB) power spectrum, with the Python, Viper, 
Toco and Boomerang experiments suggesting a first acous- 
tic peak with a fairly well-defined height and position. 
Further great improvements are expected shortly from 
the Antarctic Boomerang flight, the MAP satellite and 
other experiments, with the potential to accurately mea- 
sure about ten cosmological parameters (Jungman et al. 
1996; Bond et al. 1997; Zaldarriaga et al. 1997; Efstathiou 
& Bond 1998), especially when combined with galaxy red- 
shift surveys (Eisenstein et al. 1999), supernovae la (SN 
la) observations (White 1998) or gravitational Lensing 
(Hu & Tegmark 1999). 

Comparing these observations with theoretical predic- 
tions to achieve this goal in practice is highly non-trivial, 
even aside from the experimental challenge of controlling 
systematic errors, and is often broken down into several 
steps, schematically illustrated in Figure [t| 

1. Compress the time-ordered data set into sky maps 
at various frequencies, so as to minimize the effect of 
correlated detector noise, scan-synchronous offsets, 
and other non-sky signals (Wright 1996; Tegmark 
1997a). 

2. Compress the multi- frequency maps into a single 
CMB map so as to minimize the contribution of 
detector noise and foreground contamination (see 
Tegmark et al. 2000 and references therein). 

3. Compress this CMB map into measurements of the 
angular power spectrum on various angular scales 
(Tegmark 1997b; Bond, Jaffe & Knox 1998), a step 
nicknamed "radical compression" by Bond et al. 

4. Convert these power spectrum measurements into 
constraints on cosmological parameters. 
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Fig. 1. — The analysis of a large CMB data set is conveniently bro- 
ken down into four steps: mapmaking, foreground removal, power 
spectrum extraction and parameter estimation. 



This paper is focused on the last of these four steps, de- 
scribing a method and applying it to all currently available 
data. 

Since fast and accurate software is now available for 
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computing how the theoretically predicted power spec- 
trum depends on the cosmological parameters, this last 
step may at first appear rather trivial: just run a code 
such as CMBfast (Seljak & Zaldarriaga 1996) at a fine grid 
of points in parameter space and perform a \ 2 fit of the 
corresponding theoretical power spectra to the observed 
data. The problem is that the currently most popular 
cosmological model has of order N — 10 free parameters, 
making such an ./V-dimensional parameter grid rather huge 
and unwieldy. There arc also additional challenges related 
to evaluating the likelihood function (Bond et al. 1998; 
Bartlett et al. 1999) that we will discuss in more detail 
below. 

The first analyses based on COBE DMR used N — 2 pa- 
rameters, the scalar quadrupolc normalization A s and tilt 
n s of the power spectrum {e.g., Smoot et al. 1992; Gorski 
et al. 1994; Bond 1995; Bunn & Sugiyama 1995; Tegmark 
& Bunn 1995). Since then, many dozens of papers have ex- 
tended this to incorporate more data and parameters, with 
recent work including Bunn & White (1997); de Bernardis 
et al. (1997); Ratra et al. (1999); Hancock et al. (1998); 
Lesgourges et al. (1999); Bartlett et al. (1998); Webster et 
al. (1998); Lineweaver & Barbosa (1998ab); White (1998); 
Bond & Jaffe (1998); Gawiser & Silk (1998); Contaldi et 
al. (1999), Griffiths et al. 1999; Mclchiorri et al. (1999); 
Rocha (1999). 

In an important paper, Lineweaver (1998) made the leap 
up to N — 6 parameters: n s , A s , the Hubble constant 
h and the relative densities fi c d m , £lb and J7a of CDM, 
baryons and vacuum energy, thereby setting a new stan- 
dard. Tegmark (1999, hereafter T99) pushed on to N = 8 
by adding the reionization optical depth r and the gravity 
wave amplitude A t . Efstathiou et al. (1999), Efstathiou 
(1999), Bahcall et al. (1999), Dodelson & Knox (2000) 
and Melchiorri et al. (2000) performed analyses with dif- 
ferent techniques, better data and around 6 parameters, 
all finding interesting joint constraints on J1a and the mat- 
ter density. Despite this progress, however, a number of 
issues still need to be improved to do justice to the ever- 
improving data. 

Perhaps the most glaring problem is that no closed mod- 
els (White & Scott 1996) have ever been computed ex- 
actly in these analyses, except for that of Melchiorri et 
al. (2000), since the CMBfast software was limited to flat 
and open models. We remedy this in the present paper 
by using version 3.2 of CMBfast (Zaldarriaga & Seljak 
1999), which is generalized to closed models. A new code 
by Challinor et al. (2000), based on CMBfast, also does 
closed models, agreeing well with CMBfast. 

Another problem with all previous analyses is that they 
assumed the massive neutrino density Q u to be zero, al- 
though there is strong evidence from both the atmospheric 
and solar neutrino anomalies that Q v > 0. Since these 
particle physics constraints are only sensitive to the dif- 
ferences between the (squared) masses of the various neu- 
trinos, they do not imply that neutrinos are astrophysi- 
cally uninteresting. Indeed, because the CMB and matter 
power spectra can place some of the most stringent up- 
per limits on neutrino masses (Hu et al. 1998), it would 
be a real pity to omit this aspect of the analysis. Just as 
increasing f2 c d m suppresses the acoustic peaks, increasing 
£l v suppresses does so by a comparable amount. Indeed, 
these two parameters become nearly degenerate for large 
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Fig. 2. — The band power measurements used. 



f2„ , corresponding to neutrinos massive enough to be fairly 
nonrclativistic at the relevant rcdshifts, so the inclusion of 
neutrinos will, among other things, weaken the lower limit 

On ficdm- 

Another weakness of the T99 analysis was that it as- 
sumed that the relative amplitude r = A t /A s of gravity 
waves was linked to the tensor spectral index by the in- 
flationary consistency relation r = —7n t (Liddlc & Lyth 
1992), although one of the most exciting applications of 
CMB data will be to test this relation. We will remedy 
both of these problems by extending our parameter space 
to N = 10 dimensions, including both D, v and nt as free 
parameters. 

Finally, as we will discuss at length below, there are a 
number of areas where accuracy has been unsatisfactory 
and can be substantially improved. 

The rest of this paper is organized as follows. We de- 
scribe our method in Section 0, apply it to the available 
data in Section □ and summarize our conclusions in Sec- 
tion BL Some technical details regarding marginalization 
are derived in the Appendix. 

2. METHOD 



2.1. The problem 

Our data consists of the n — 65 band power mea- 
surements 5Tf listed in Table 1 and shown in Figure El 
i = l,...,n. The band power measurement di = 5T~ 
probes a weighted average of 5Tf = £(£ + l)Ce/2n, 



(d i ) = {ST?) = ^WiSTl 



(1) 



where W\ is the band-power window function (distinct 
from the variance window function; see Knox 1999). These 
known weights W\ reflect which angular scales the mea- 
surement is sensitive to. 

The power spectrum in turn depends on our vector of 
cosmological parameters p in a complicated fashion CV(p) 



that we use CMBfast to compute. The scatter in the rela- 
tion between di and (di) due to detector noise and sample 
variance is described by a likelihood function Ci(di\ Cg(p)), 
the probability distribution for di given p. If the errors in 
the different data points were all independent, then the 
combined likelihood of observing the set of all data given 
p would be simply 



£(data;p) = JjA(dj; Ct(p)). 



(2) 



This is complicated by the fact that some m easu rements 
are correlated, as will be discussed in Section 2.7. 



Our problem is to evaluate this likelihood function in 
the 10-dimensional parameter space that p inhabits. To 
obtain Bayesian constraints on individual parameters or 
joint constraints on interesting pairs (such as Sl m and £Ia), 
we then marginalize over the remaining parameters with 
appropriate priors. 

2.2. Breaking it into four sub-problems 

If we had infinite computing resources, the solution 
would be straightforward: compute the theoretical CMB 
power spectrum Ci(p) with the CMBfast software (Seljak 
& Zaldarriaga 1996) and the corresponding likelihood at a 
fine grid of points in the iV-dimensional parameter space. 
In practice, this is inconvenient. With M grid points in 
each dimension, M power spectra must be computed. 
Even if we take M as low as 10, the amount of work thus 
grows by an order of magnitude for each additional pa- 
rameter. With 1 minute per power spectrum calculation, 
N = 10 would translate to over 10 4 years of CPU time. 

Fortunately, the underlying physics (see e.g. Hu et al. 
1997 for a review) allows several numerical simplifications 
to be made. We will adopt the approximation scheme 
used in T99 with additional improvements as described 
below. Our method conveniently separates into four sep- 
arate steps. 

• Step 1: Run CMBfast many times for three partic- 
ular subsets of the parameter grid. The results are 
three large files: one with tensor power spectra, one 
with scalar power spectra for I < 100 and one with 
scalar power spectra for £ > 100. 

• Step 2: Interpolate these spectra onto larger sub- 
sets of the parameter grid. The results are two huge 
files with 7-dimensional model grids, one for scalars 
and one for tensors. These two files allow any power 
spectrum in the full 10-dimensional model grid to be 
computed almost instantaneously. 



• Step 3: 

model. 



Compute and save the likelihood £ for each 



• Step 4: Perform 10-dimensional interpolation and 
marginalize to obtain constraints on individual pa- 
rameters, constraints in the (f2 m , £l\)-pl&ne, etc. 

Below we will describe each of these four steps in turn. Be- 
fore doing this, however, it is interesting to contrast this 
"huge grid" approach with an alternative strategy. Dodel- 
son k Knox (2000) and Melchiorri et al. (2000) performed 
their analyses without computing and storing such a grid. 
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Instead, they found the maximum-likelihood parameter 
vector p by a direct numerical maximum search, comput- 
ing power spectra with CMBfast on the fly as needed. Sim- 
ilarly constraints in say the (fi m , r2A)-plane were obtained 
by performing a numerical maximum search over the re- 
maining parameters for each (Q m , grid point. One 
drawback of this approach is that everything needs to be 
repeated from scratch if the data set is changed, whereas 
steps 1 and 2 in our method are independent of the data 
set and need only be done once and for all. The same 
drawback applies to exploring different priors. There is 
also no guarantee that CMBfast gets run fewer times with 
this direct search approach, as a numerical search in the 
high-dimensional space tends to require large numbers of 
likelihood evaluations (Dodelson 1999, private communi- 
cation; see also Hannestad 1999). 

2.3. Parameter space 
We choose our 10-dimensional parameter vector to be 



(r, f2 k , f2 A , Wcdm, w b , Wk, n s , n u A s , A t ), 



(3) 



where the physical densities u>i = h 2 Q,i : i =cdm, b, v. The 
advantage of this par ame terization (see Bond et al. 1997) 
will become clear in §2.5. f2k is the spatial curvature, so 



in terms of these parameters, 



h = 



Wcdm + Wb + U v 
1 — S2k — fl\ 



(4) 



This parameter space is identical to that used in T99 ex- 
cept that we have added oj v and replaced h by Oa as a 
free parameter. 

We wish to probe a large enough region of param- 
eter space to cover even quite unconventional models. 
This way, constraints from non-CMB observations can be 
optionally included by explicitly multiplying C(p) by a 
Bayesian prior after Step 3 rather than being hard-wired 
in from the outset. To avoid prohibitively large M, we use 
a roughly logarithmic grid spacing for u m , uj^ and u u , a 
linear grid spacing for fit and Qaj a hybrid for t, w v , n s 
and n t , and (as described below) no grid at all for A s and 
At. We let the parameters take on the following values: 

• t = 0,0.05,0.1,0.2,0.3,0.5,0.8 

• n A = -1.0, -0.8, -0.6, -0.4, 1.0 

• Q k such that Q m = 1 - fi k - Q A = 0.2, 0.4, 2.0 

• Wcdm = 0.02, 0.03, 0.05, 0.08, 0.13, 0.2, 0.3, 0.5, 0.8 

• u) h = 0.003, 0.005, 0.008, 0.013, 0.02, 0.03, 0.05, 0.08, 0.13 

• oj u = 0, 0.02, 0.05, 0.08, 0.13, 0.2, 0.3, 0.5, 0.8 

• n s = 0.50, 0.70, 0.90, 1.00, 1.10, 1.20, 1.30, 1.50, 1.70 

• th = -1.00, -0.70, -0.40, -0.20, -0.10, 

• A s is not discretized 

• A t is not discretized 



Note that the extent of the Sik-grid depends on Oai giving 
a total of 10 x 11 = 110 points in the (fi m , fiA)-plane. 
Our discrete grid thus contains 7x 110x9x9x9x9x 
6 = 30,311,820 models. As will become clear from our 
discussion below, the main limitation on this grid size is 
the disk space used in Step 2 rather than the CPU time 
used in Step 1, so it will probably be desirable to further 
refine it as CMB data gets better. 

2.4. Separating scalars and tensors 

If we were to run CMBfast in the standard way, com- 
puting scalar and tensor fluctuations simultaneously, we 
would have to explore a 9-dimensional model grid since 
only A s drops out as an overall normalization factor. In- 
stead, we compute the scalar fluctuations Cf calar and the 
tensor fluctuations C) ensor separately, normalize them to 
both have a quadrupole of unity and compute the com- 
bined power spectrum as 



C e = A S C, 



scalar 



A t Cf 



(5) 



We therefore only need to compute two 7-dimensional 
grids with CMBfast, one over (r, fik, ^A, w c d m , ^b, Ww, n s ) 
and the other over (r, Okj f^A, ui c d m , uj\,,(jj v ,n t ). The other 
advantage of calculating scalars and tensors separately is 
that tensors only need to be calculated up to an I of 400, 
which saves additional time. 

Allowing 1 minute per model, the scalar grid alone 
would still take about 10 years of CPU time. Most models 
take substantially longer to run, since reionization, cur- 
vature and neutrinos slow CMBfast down. It is therefore 
useful to take advantage of the underlying physics to make 
further simplifications. 

2.5. Separating small and large scales 

The tensor power spectrum depends only weakly on 
Wcdm, ^b and <jj v . We therefore compute the tensor power 
spectrum with the fine grid restricted to (r, fik> ^A> fh), 
using only ultra-course three-point grids for w c dm, and 
uj u . We then fill in the rest of the (w c dmj cdb> w^)- values 
using cubic spline interpolation. 

1/2 

The scalar power spectrum Ci for I <C 100/fim cor- 
responds to fluctuations on scales outside the horizon 
at recombination. This makes it almost independent of 
the causal microphysics that create the familiar acoustic 
peaks, i.e., independent of w m , u) v and 0Jh- We therefore 
compute the scalar power spectrum on large scales with 
the fine grid restricted to (r, ilk, ^A, n s ), using only ultra- 
course three-point grids for w c( j m , uj^ and uj v to to pick 
up weak residual effects aliased down from larger I. We 
then fill in the rest of the (w c dm, c^b, w„)-values using cubic 
spline interpolation. 

For the remaining (high £) part of the power spectrum, 
more radical simplifications can be made. First of all, the 
effect of reionization is mainly an overall suppression of Ci 
by a constant factor e~ 2r on these small scales. Second, 
the effect of changing both f^k and fl\ is merely to shift 
the power spectrum sideways. This is because the acous- 
tic oscillations at z > 1000 (at which time f2k ~ ^A ~ 
regardless of their present value) depend only on ui m , uj^ 
and and the geometric projection of these fixed length 
scales onto angular scales 9 in the sky obeys 9 oc l/di ss , 
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where di ss is the angular diameter distance to the last scat- 
tering surface. In T99 and Efstathiou et al. (1999), <ii ss was 
estimated analytically by integrating out to the redshift of 
last scattering given by the fit of Hu & Sugiyama (1996). 
Since CMBfast automatically computes this quantity any- 
way, we eliminate this approximation by simply using this 
numerical value. 

O m and Oa also modify the late integrated Sachs- Wolfe 
effect, but this is important only for I < 30 (Eisenstein 
et al. 1999). The only other effect is a small correction 
due to gravitational lensing (Metcalf & Silk 1997; Stompor 
& Efstathiou 1999), which we ignore here because of the 
large error bars on current small-scale data. To map the 
model p* into the model p with all parameters except r, 
Ok and Oa unchanged, we thus multiply its high I power 
spectrum by e 2{j ~ r ) and shift it to the right by an £-factor 
of diss/dj;,,. 

We therefore adopt the following procedure for the first 
two steps. In Step 1, we compute 

• scalar power spectra out to i = 5000 for the subgrid 
with t = Ok = = (merely 6,561 models), 

• scalar power spectra out to £ = 400 with the sub- 
grid restricted to r = 0,0.1,0.8, cu m = 0.02,0.2,0.8, 
u h = 0.003,0.02,0.13, u v = 0,0.2,0.8 (80,190 mod- 
els), and 

• tensor power spectra with the matter densities re- 
stricted to this same subgrid (80,190 models). 

In Step 2, we use cubic spline interpolation separately 
for each £ to extend the tensor models and the low-£ scalar 
models to the full parameter grid. To account for the 
effects of t, Ok and Oa, we then shift the high-^ scalar 
models vertically and horizontally as described above and 
splice them together with the corresponding low-£ models 
at a cutoff value £* . For a given model, we choose £* to be 
100 multiplied by the horizontal shifting factor. In other 
words, the high-^ model always gets spliced at the location 
that corresponded to £ = 100 before shifting it sideways, 
so open models get spliced at higher £ and closed at lower. 
When computing the low-^ models in Step 1, we therefore 
adjust the accuracy flag "ketamax" in CMBfast to be 400 
times this same shifting factor. 

The public releases of CMBfast normalize the power 
spectra Ci to COBE automatically. This normalization 
scheme is not appropriate for our merging technique, since 
we need a convention independent of the cosmological pa- 
rameters so that when we combine the high and low grids, 
the relative normalization of the models is correct. To 
achieve this, we removed the COBE normalization from 
CMBfast and normalized the power spectrum in both the 
flat and non-flat codes to agree on scales much smaller 
than the curvature scale. 

For the reader interested in implementing this scheme, 
it is worth noting that almost all the time in Step 1 is 
spent on the low scalar grid. For this grid, substantial 
time is saved by only computing the power spectrum for 
the low f-values where it is needed. Note that the loop 
over tilts (n s or n t ) is essentially free, since CMBfast can 
compute multiple tilts simultaneously. The only reason 
we have used so few tilt values is because of disk space 
considerations in Steps 2 through 4. Including various test 
runs, we filled up more than half of a 200 GB disk array. 



2.6. Testing step 2 

To test the accuracy of the resulting scalar and tensor 
model grids produced in Step 2, we drew a random sample 
of ~ 10 3 of the models and recomputed them from scratch 
with CMBfast. For most models, we found our results 
to be accurate to a within a few percent. The remain- 
der generally had very early reionization (high r and low 
ft,Ob), which causes a broad bump of regenerated power 
from motions on the new last scattering surface. Since our 
approximation simply suppresses the small scale power by 
e _2r , it therefore underpredicts the power on the angular 
scale corresponding to the horizon size at reionization. In 
addition, the interpolation performed poorly at the lowest 
£ for some quite crazy models, which could be remedied 
by running CMBfast on a finer grid. 

As data quality improves further, it will probably be 
worthwhile to simply include r explicitly in the high-^ 
grid. In this case, the remaining errors introduced by our 
approximation scheme can of course be continuously re- 
duced to zero by refining the (w c dm, ^b, t^v)-grid for low £ 
and shifting the splicing point upwards from £ ~ 100. 

2.7. Step 3: computing likelihoods 

We use the CMB data and window functions listed in 
Table 1 and shown in Figure 0. This is taken from the com- 
pilation of Lineweaver (1998) with the addition of the new 
results from QMAP (Devlin et al. 1998; Herbig et al. 1998; 
de Oliveira-Costa et al. 1998), MSAM (Wilson et al. 1999), 
Toco (Torbet et al. 1999; Miller et al. 1999), Python V 
(Coble 1999), Viper (Peterson et al. 2000) and Boomerang 
(Mauskopf et al. 1999). For an up-to-date annotated com- 
pilation of all current data, see Gawiser & Silk (2000). For 
the COBE data, we use the exact window function from 
Tegmark (1997b). In all other cases, we approximate the 
window functions by a Gaussian of FWHM=4i g h — Aow 
from Table 1 . This approximation does not appear to have 
much of an effect on the results: we repeated the analy- 
sis with the much more extreme approximation where the 
windows are delta functions at (£\ w + ^high)/2 and ob- 
tained essentally unchanged results. Knox & Page (2000) 
compared full window functions with delta functions and 
came to the same conclusion. 

As discussed in great detail by Bond, Jaffe & Knox 
(1998) and also by Bartlett et al. (1999), an accurate 
calculation of the likelihood function £(data|p) is non- 
trivial. If the band-power measurement a\ is a quadratic 
function of the sky temperatures measured by the exper- 
iment in question, then Ci(di] Ci (p)) is a generalized % 2 
distribution when viewed as a function of di (Wandelt et 
al. 1998), but sufficient details to compute this function 
exactly are rarely published when band power measure- 
ments arc released. Useful approximations have there- 
fore been derived that require only the asymmetry be- 
tween upper and lower error bars as input (Bond, Jaffe 
& Knox (1998), Bartlett et al. (1999). The former ap- 
proximation is implemented by a nice publicly available 
package called RADPACK, maintained by L loyd Knox at 
http:/ '/flight. uchicago.edu/knox /radpack.htm\ which was 
used in the analyses of Dodelson & Knox (2000) and Mel- 
chiorri et al. (2000). In this paper, we will stick with the 
cruder Gaussian approximation 

£(d;CK P ))« e ^ (d - <d))tC ~ 1(d - <d)) , (6) 
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where d is the vector of measurements d\,d2, ■■■d n and C 
is the associated n x n covariance matrix of measurement 
errors. This means that the full likelihood function C = 
e~ x / 2 , where \ 2 is simply the chi-squared goodness of fit 
of the model to the data. 

We have chosen to keep things this simple because we 
are currently unable to eliminate a third major source of 
inaccuracy: many of the recent multi-band measurements 
released (which dominate the constraining power) have 
non-negligible correlations between their different bands, 
but these correlations have not yet been published by the 
experimental teams. An alternative approach would be 
to convert these data sets to uncorrelated measurements, 
as was done with the 8 COBE points we use. In the in- 
terim, an alternative is to simply use only those exper- 
iments which either have very small correlations, or sig- 
nificant correlations which are publically available, as was 
done in Dodelson & Knox (2000) and Knox & Page (2000). 

We model C as a sum of three terms, C = C^ mcas - ) + 
Q(scal) _|_ Q(ical)^ corresponding to measurement errors, 
source calibration errors and instrument calibration errors, 
respectively. c( mcas ) reflects the part of the errors which 
are uncorrelated between the different experiments and is 
due to detector noise and sample variance. We approxi- 
mate it by 

C ij = °ij<7i » ( 7 ) 

where Oi is defined as the average of the upper and lower 
error bars quoted for di = ST 2 (not for ST) in Table 1. 

The last two terms reflect the correlations between mea- 
surements due to calibration errors. C^ Ical - ) is the part spe- 
cific to a single multi-band experiment and C' scal - ) is the 
part that is correlated with other experiments that are cal- 
ibrated off of the same (slightly uncertain) source. Both 
QMAP and Saskatoon calibrate off of Cass A, and we as- 
sume that a 8.7% error due to the flux uncertainty of this 
object is common to these experiments. MAT, MSAM and 
Boomerang all calibrate off of Jupiter. To be conservative, 
we assume that the full 5% calibration uncertainty from 
Jupiter's antenna temperature is shared by these experi- 
ment. The true correlation should be lower, since the three 
experiments observed Jupiter at different frequencies. The 
remaining multi-band experiments do not have any such 
inter-experiment correlations: COBE/DMR calibrated off 
of the dipole, Viper off of the moon and Python V off of 
internal loads. This contribution to the noise matrix is 
therefore 

cg cal ^(2 SlJ ) 2 ^, (8) 

where 

f 8.7% if i and j refer to QMAP or Saskatoon, 
Sij = I 5% if i and j refer to MAT, MSAM or Boom, 
1 otherwise. 

The factor of 2 in equation (|8|) stems from the fact that 
the percentage error on 5Tf is roughly twice that for STi 
as long as it is small. Similarly, the remaining term is 

Cg cal) = (2r iJ ) 2 d i d i) (10) 

where r\j = if i and j refer to different experiments. 
If band powers i and j are from the same experiment, 
then rij is the quoted quoted calibration error with the 



source contribution subtracted off in quadrature. We 
use r =0.063 for Saskatoon, 7.9% for QMAP, 14% for 
Python V, 8% for Viper, 8.7% for Toco 97, 6.2% for Toco 
98, for MSAM and 6.4% for Boomerang. 

There is certainly ample room for improvement of in 
this 3rd step. To put all these statistical issues in perspec- 
tive, however, the authors feels that an even more pressing- 
challenge will be to test the data sets for systematic er- 
rors, e.g., by comparing them pairwise where they overlap 
in sky coverage and angular resolution (Knox et al. 1998; 
Tegmark 1999a). 

2.8. Step 4 : Marginalizing 

For a Bayesian analysis, the 10-dimensional likelihood 
should be multiplied by a prior probability distribution re- 
flecting all non-CMB information, then rescaled so that it 
integrates to unity and can be interpreted as a probability 
distribution. To obtain constraints on some subset of the 
parameters (f2k and Q\, say), one would then marginal- 
ize over all other parameters by integrating over them. 
Such a direct integration was performed by Efstathiou et 
al. (1999) where the parameter space had fewer dimen- 
sions. Since such integration is quite time-consuming in a 
high-dimensional space, most other multi-parameter anal- 
yses published have adopted the alternative approach of 
maximizing rather than integrating over the unwanted pa- 
rameters. For instance, the reduced likelihood function for 
t is obtained by looping over a grid of r-values and choos- 
ing the remaining parameters so that they maximize the 
likelihood in each case. These two approaches are equiva- 
lent if the full likelihood function is a multivariate Gaus- 
sian, as shown in Appendix A. If Gaussianity is a poor 
approximation, the maximization approach can tend to 
underestimate the error bars (Efstathiou et al. 1999). The 
Gaussianity approximation is indeed a poor one at the mo- 
ment, especially for the case with no priors, but it should 
gradually improve as future data and non-CMB priors re- 
duce the size of the allowed parameter region. 

In the published grid-based implementations of the max- 
imization method (e.g., Lineweaver 1998; T99), the min- 
imization was performed by simply looking at the like- 
lihoods in the pre-computed model grid and picking the 
largest one. Since the true maximum does generally not 
reside exactly at a grid point, this method always underes- 
timates the true maximum. Unfortunately, the magnitude 
of this underestimation will vary in a rather random way, 
depending on how close to the constrained maximum hap- 
pens to be to the nearest grid point. This effect can cause 
jagged- looking and somewhat misleading results, as shown 
in Figure |[ Note that even an error as small as 0.5 in \ 2 
changes the likelihood by more than 20%. Some of the 
jaggedness/ringing seen in the plots in, e.g., Lineweaver 
(1998) and T99 is likely to be due to this effect. In con- 
trast, the ringing seen in many of the (f2 m ,f2A) exclusion 
plots further on in this paper is a purely cosmetic problem, 
due to instability in the IDL interpolation routine used to 
generate the contour plots. 

The problem at hand is to find the maximum of some 
hypersurface in a high-dimensional space. It is easy to see 
that if we approximate the surface by multilinear interpo- 
lation between the grid points where we know its height, 
we will recover this unsatisfactory method, since the in- 
terpolated surface can only have maxima at grid points. 
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Fig. 3. — Marginalization method comparison, x 2 I s plotted as a 
function of Oa when maximizing over all other parameters with no 
priors. The squares show the result of using multidimensional spline 
interpolation when maximizing and the crosses show the result of 
simply picking the smallest x 2 -value in the model grid. Note that 
a seemingly small error of unity in x 2 changes the likelihood by a 
factor of 1.6. 



We have chosen to use cubic spline interpolation instead. 
As seen in Figure |3[ this works substantially better and 
eliminates the random jaggedness of the simpler method. 

For the reader interested in implementing this method, 
we give some additional practical details below. Other 
readers may wish to skip directly to the next subsection. 

We perform the cubic spline interpolation and subse- 
quent maximization one dimension at a time. Just as for 
multilinear interpolation, the result of this procedure is 
independent of the order in which we interpolate over the 
different parameters. We start by maximizing over the 
scalar and tensor normalizations, which is readily done 
analytically since x 2 depends quadratically on A s and A t . 
We save the remaining 8-dimcnsional grid in a huge file 
together with the optimal values of A s and A t and the 
corresponding x 2 value. To marginalize over any given 
parameter pi , we first sort this file so that this parameter 
varies fastest. In each block where the remaining parame- 
ters are fixed, we then spline over this parameter and find 
the maximum p* analytically from the spline coefficients. 
Since it is interesting to keep track of the physical param- 
eters of the best fit models, we save not only the x 2 -value 
but also the other parameter values spline interpolated to 
the point where pi — p* , replacing the entire block of mod- 
els in the file by this interpolated one. 

We found that when \ 2 varies rapidly, a standard cubic 
spline occasionally causes unwanted oscillations. Such a 
rapid rise in x 2 occurs only in the extreme parts of the 
parameter grid that we do not care about (since they 
are completely ruled out), yet the resulting ringing eas- 
ily propagates to the region that we are interested in near 
the minimum. We therefore adopted a scheme where we 
through away irrelevant distant points before splining if 
they were too extreme. Specifically, before performing a 
1-dimensional cubic spline, we first located the lowest grid 
point. We then included all points to the left of it until we 
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Fig. 4. — The best fit model is shown for the case of no prior (solid 
red/dark grey) and with the priors h = 0.65 ± 0.07, h 2 n h = 0.02 
and r = r = (solid green/light grey). The dotted lines show the 
decomposition of the former curve into scalar and tensor fluctua- 
tions. The model parameters are listed in Table 2. Although all 65 
measurements were used in the fits, they have been averaged into 
14 bands in this plot to avoid cluttering. The band powers whose 
central I- value fell into any given band were average with minimum- 
variance weighting, and their corresponding window functions were 
averaged as well. This binning was used only in this plot, not in our 
analysis. 



reached one whose x 2 was higher by 10 or more. Points to 
the right were included analogously. We found this simple 
scheme to work quite well in practice. Indeed, the slight 
wiggliness of the contour plots shown in the next section 
is caused mainly by the plotting software itself (the 2D 
interpolation routine of IDL), not by our marginalization 
from 10 to 2 dimensions. 

3. RESULTS 
3.1. Basic results 

To avoid having our constraints severely diluted by 
"silly" models, we include two prior pieces of information 
when presenting our basic results. We assume that the 
Hubble parameter h — 0.65 ± 0.07 at 1 — a (see Freedman 
1999 for a recent review of /i-measurements) and that the 
baryon density Wb = h 2 Q,b ks 0.02 (Buries et al. 1999 re- 
port = 0.019 ± 0.0024, and we approximate the u>b 
error bars by zero since they are much smaller than our 
grid spacing). This value of Wb is roughly consistent 
with that measured by Wadsley et al. (1999) using the 
Helium Lyman- Alpha Forest. We assume that the error 
distribution for h is Gaussian. 

The parameters of the best fit model are listed in Table 2 
both with and without these priors. The corresponding no- 
prior power spectrum is shown in Figure |1| together with 
the "vanilla" version with the above-mentioned priors and 
r = r = 0. As can be seen, the fitting procedure uses the 
additional freedom to match features in the data in quite 
amusing ways. Since the data dip at £ ~ 50 and rise very 
sharply thereafter, a feature that simpler models cannot 
match, the minimization procedure finds the best fit model 
to have a dramatic blue-tilt (n s ~ 1.7) and almost the 



8 



entire COBE signal due to gravity waves. Although this 
particular model is ruled out by other constraints — for 
instance, primordial black hole abundance (Green et al. 
1997) and spectral distortions (Hu et al. 1994) give upper 
bounds n s < 1.3) — it illustrates the importance of fitting 
for all 10 parameters jointly. Indeed, it is the inclusion of 
gravity waves in our models that makes the constraints on 
n s so weak. 

The 1-dimensional likelihood functions for six of the best 
constrained parameters are shown in Figure ^, marginal- 
ized over the other 9 parameters. Although none of these 
parameters are very tightly constrained, it is encouraging 
that CMB observations are already sufficiently powerful 
to place upper and lower limits on O m , Qa and n s at the 
2 — a level. Because w c( jm and lo v are by definition non- 
negative, these density parameters are also bounded from 
both sides. On the other hand, better data will be required 
to place interesting constraints on r, since this parameter 
is almost degenerate with the overall normalization (see, 
e.g., Eisenstein et al. 1999). The best constrained param- 
eter so far is seen to be the spatial curvature f2k, with 
—0.24 < J7k < 0.38 at 95% confidence. For comparison, 
using Figure 2 in Dodelson & Knox (2000) to read off the 
point where the likelihood drops to e~ 2 '/ 2 ss 0.14 gives the 
95% upper limit f\ < 0.38. Although the exact numerical 
agreement is likely to be coincidental (since we use more 
data, etc), this is nonethethclcss very reassuring evidence 
that the basic result is robust. 

Because of the well-known angular diameter distance 
degeneracy, where increasing fi^ shifts the acoustic peaks 
to the right and increasing f?A can shift them back to the 
left, we also plot our constraints marginalized onto the 2- 
dimensional (f2 m , f2A)-plane. Figure |] shows the results 
using all the data, and Figures 0- ^ shown the constraints 
from various subsets that will be described below. In all 
cases, the shaded regions show what is ruled out at 95% 
confidence (2 — a). For our 2-dimensional parameter space, 
this corresponds to A\ 2 = 6.18 (not 4), as in Press et al. 
(1992) §15.6. 

We show four nested contours. The least constrain- 
ing one is when all 10 parameters are treated as un- 
known. The second includes our Hubble parameter prior 
h = 0.65 ± 0.07. The third (what we call our "basic re- 
sult") adds the nucleosynthesis constraint Wb ~ 0.02 and 
the fourth imposes r — t = 0. Although the first two 
priors are observationally well-motivated, the last one is 
completely ad hoc, and has only been included to illus- 
trate the importance of including reionization and gravity 
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Fig. 5. — The marginalized likelihood is shown for six individual 
parameters using all 65 band power measurements and priors only 
from nucleosynthesis (/i 2 f2 b = 0.02) and the Hubble parameter (h = 
0.65±0.07). The 2a limits (see Table 2) are roughly where the curves 
cross the horizontal lines. 
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Fig. 6. — The regions in the (f2 m , fiA)-plane that are ruled out at 
2<t using all the data are shown using no priors (red/dark grey), the 
prior h = 0.65±0.07 (orange red/grey), the additional nucleosynthe- 
sis constraint h 2 Q b = 0.02 (orange/light grey) and the additional 
constraints r = r = (yellow/very light grey). 



waves in analyses of this kind. 

When removing a prior constraint (ojb = 0.02) from our 
basic result, we reduce all x 2 -values by unity before plot- 
ting the corresponding contour, to account for the added 
degree of freedom. Similarly, we subtract 2 when dropping 
both constraints and add 2 when imposing r = r = 0. 

Figure || shows that the CMB data alone is able to rule 
out very open (Q m < 0.4) models with fl\ = 0. Adding 
the /i-constraint tightened the limits somewhat, mainly 
on very closed models. A more important prior at at 
this stage is that from nucleosynthesis, which helps elim- 
inate most of the remaining closed models and places the 
first lower limit on J7a • This makes the allowed region in 
the (f2 m , f2A)-plane bounded, which is important: other- 
wise all other constraints, which are marginalized over 17a, 
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Fig. 7. — Same as previous figure, but using only the COBE and 
the "East Coast" data sets (Saskatoon, QMAP and Toco). 
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Fig. 8. — Same as Figure ^| but using only COBE and "Snake" 
data sets (Python V and Viper). 



would depend sensitively on the poorly motivated prior 
f^A > — 1 that was hard- wired into our parameter grid. 

Adding the additional prior r = r = 0, which we recom- 
mend against for the reasons described in the introduction, 
is seen to rule out about half of the remaining models. The 
exclusion of these parameters is seen to predominantly rule 
out closed models, whose first acoustic peak is too far to 
the left. This is because it can be shifted back to the right 
by tilting the power spectrum (increasing n s ), after which 
the peak height can be brought back down to allowed lev- 
els using reionization or gravity waves. In contrast, it is 
not possible to salvage too open models with this trick: de- 
creasing n s would require raising the first peak, but there 
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Fig. 9. — Same as Figure H, but using only COBE and Boomerang 
data sets. The yellow/light grey contour corresponds to the result 
of Melchiorri et al (2000) if we impose ui v = 0. 



is of course no such thing as negative reionization or neg- 
ative gravity waves. 

3.2. Is everything consistent? 

The plots we have shown so far are Bayesian in nature, 
and can only be interpreted as advertised if the data are 
consistent with the best fit model. This is indeed the case, 
since we obtain x 2 = 49 for the best fit model. Dropping 
the constraints on h and cjb reduces \ 2 by as much as 
6, corresponding to the rather unphysical model shown in 
Figure [|. For comparison, 65 data points and 10 param- 
eters gives 55 degrees of freedom^, so we should expect 
X 2 = 55 ± 21 at 2<7. 

3.3. Robustness to choice of data 

To investigate the relative constraining power of differ- 
ent data sets and the degree to which they give consistent 
results, we repeated our analysis for three subsets of the 
observations. Specifically, we partitioned the most recent 
observations reporting multiple band powers into three dis- 
joint sets and combined each one of them with the COBE 
measurements : 

1. The "East Coast" sample contains Saskatoon, QMAP, 
TOCO and COBE. 

2. The "snake" sample contains Python, Viper and 
COBE. 

3. The Boomerang sample contains Boomerang-97 and 
COBE. 

1 ln fact, our parameters do not span a full 10-dimensional sub- 
space of the 65-dimensional data space when they range over physi- 
cally reasonable values, since some of them have only a minor impact 
(say nt) or are subject to near degeneracies like (A B , r), (Ojj, Q^) and 
(^cdm ) Wi/) • The effective number of degrees of freedom to subtract 
off may therefore be closer to 6 than 10. 
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As seen in Figures [7h9, they all allow flat models and dis- 
favor very open (f^ 3> 0) models, which would place the 
first acoustic peak too far to the right. As more priors get 
added, they are seen to disfavor very closed models as well. 
In all cases, the best fit model has an acceptable x 2 -value. 

3.4. Importance of calibration errors 

To assess the importance of calibration uncertainties, we 
repeated our analysis with all calibration errors set to zero. 
We found that in this case, no model provided a very good 
fit to the data, with x 2 ~ 76 for the best fit. This is only a 
problem at the 2cr-level for 65-10=55 degrees of freedom, 
and perhaps even less in light of footnote 1. However, it 
nonetheless caused the the Baysean constraints to become 
quite misleading, suggesting that most parameters were 
very tightly constrained around their maximum-likelihood 
values — for instance, that lo v — was ruled out at high 
significance. In conclusion, it is of paramount importance 
to include calibration errors. This was done in the above- 
mentioned analyses of Dodelson & Knox (2000) and Mel- 
chiorri et al. (2000), but not in most earlier work. 

The main discrepancies pushing up the x 2 were local- 
ized to two places in Figure ^. The first trouble spot was 
at 40 < I < 70, where the low Python V points con- 
flicted with the higher measurements on a similar scale 
from, e.g., Toco, QMAP and Saskatoon. The second prob- 
lem occurred at I < 300, where the models failed to fall 
rapidly enough from the high Toco detections down to the 
lower power levels seen by MS AM, CAT, OVRO, Viper 
and Boomerang. 

3.5. Are the data internally consistent? 

Based on visual inspection of the data, it has been sug- 
gested that all CMB measurements cannot be consistent 
with any model, since some measurements disagree with 
others on a comparable angular scale. Although we saw 
above that the x 2 -value is acceptable, the distribution of 
residuals could in principle be non-Gaussian with the a few 
severe outliers being averaged down beyond recognition in 
the x 2 -calculation. To investigate this possibility, we fit 
a 10-paramcter model with no underlying physical model 
to the data. Our model curve is simply a cubic spline in- 
terpolated between 10 grid points. Figure [lO] shows the 
65 residuals (di — (di))/ai, ignoring calibration errors, and 
reveals no striking outliers at all. 

This fit gives a x 2 ~ 67 ignoring calibration errors, i.e., 
even lower than for the CMB case. In view of footnote 1, 
we repeated this test with merely six spline points. This 
gave x 2 ~ 95 for logarithmically equispaced spline points, 
but as low a x 2 as before when more points were shifted 
to be near the 1st acoustic peak. 

In the future, as CMB data gets still better, one would 
expect the correct physically based model to provide a 
substantially better fit than "any old smooth curve" with 
the same number of free parameters. Until then, i.e., until 
our physical theory provides the most economical expla- 
nation of the observations, we cannot interpret the good 
fit of the model to the data as overwhelming evidence that 
our theory is correct. 

4. CONCLUSIONS 

We have presented a method for rapid calculation of 
large numbers of CMB models and used it to jointly con- 
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Fig. 10. — Residuals. The top panel shows a cubic spline inter- 
polated between 10 equispaced grid points (large squares) that are 
adjusted vertically to make the window function convolved curve 
(small squares) fit the observations (triangles) as well as possible. 
The middle panel shows the residuals (di — (rfj))/<Tj, the differences 
between the triangles and small squares in units of the error bars. 
The bottom panel shows a histogram of these residuals compared 
with a unit Gaussian. The reduced x 2_va l ue is simply the second 
moment of this distribution. 

strain 10 cosmological parameters from current CMB data. 
Our results on individual parameters are summarized in 
Table 2. Arguably the most interesting constraints at this 
point are those on the geometry of spacetime, summarized 
in Figure [ll]. This figure zooms in on the upper left quar- 
ter of Figure ^ and shows the joint constraints on f2 m and 
J7a from a variety of astrophysical observations. The SN 
la constraints are from White 1998, combining the data 
from both search teams (Perlmutter et al. 1998; Riess et 
al. 1998; Garnavich et al. 1998). As can be seen, the CMB 
and SN la constraints imply a positive cosmological con- 
stant (f^A > 0) when combined. If the Falco et al. (1998) 
constraints from gravitational lens statistics are included, 
the allowed region in parameter space is further reduced. 

This claim that SI a > is of course old hat (Kamionkowski 
& Buchalter 2000), originally being made over a year ago 
(see Sahni & Starobinsky 2000 for a recent review) . What 
is new here, and quite striking, is its robustness. Since the 
first such joint analysis (White 1998), the number of CMB 
band power measurements has roughly doubled, with ex- 
periments such as Toco, Python V, Viper and Boomerang 
greatly improving the accuracy on acoustic peak scales. In 
addition, the CMB treatments has been gradually refined; 
for example, several groups have added calibration errors 
and this analysis has weakened the constraints further by 
fitting for 10 parameters jointly. Yet despite these major 
improvements in both data and modeling, the cosmolog- 
ical constant remains alive and well, stubbornly refusing 
to vanish. 

Since CMB data is likely to continue to improve at a 
rapid pace, with exciting new balloon, interferometer and 



11 




0.0 0.5 1.0 
Matter density Q. m 

Fig. 11. — Constraints in the Q m — plane. The regions in the 
(n m , QA)-plane that are ruled out by our analysis at 2<r using all the 
data are shown using no priors (red/dark grey), the prior h = 0.65 ± 
0.07 (orange/grey), and the additional nucleosynthesis constraint 
/i 2 f2b = 0.02 (light orange/light grey). The SN la constraints are 
from White (1998) and the lensing constraints are from Falco et al 
(1998). The CMB contours for f2 m < 0.2 are extrapolations. 

satellite data just around the corner, it will be important 
to further improve on the type of analysis that we have 
presented here. There are a number of areas in which the 
accuracy off our treatment can be improved: 

• The problem of regenerated power from very early 
rcionization can be eliminated by explicitly looping 
over t for the high I models in Step 1 instead of using 
the e~ 2r suppression approximation. 

• In Steps 1 and 2, the effect of gravitational lensing 
can be included. 

• In Steps 1 and 2, further speed-up can be attained by 
taking advantage of the fact that the tensor fluctua- 
tions are essentially independent of io v as long as the 
total dark matter density cj ct j m + w„ stays constant. 

• The accuracy in Step 2 can probably be further im- 
proved by using some form of morphing technique as 
suggested by Sigurdson & Scott (2000). The basic 
idea is to interpolate not the power spectrum itself 
but some cleverly chosen parametrization thereof. 
We have done this to a certain extent by computing 
and interpolating the amount by which the acoustic 
peaks should be shifted sideways, but more ambi- 
tious reparametrizations are clearly possible. 

• In Step 3, the likelihoods can be computed more 
accurately by incorporating non-Gaussianity correc- 
tions as in Bond, Jaffe & Knox (1998) or Bartlett 
et al. (1999) and by including correlations between 
different data points. The former is particularly im- 
portant for upper limits, which were simply excluded 



from the present analysus. The latter includes corre- 
lations between different experiments that overlap in 
sky coverage and angular scale. Calibrations can be 
treated as multiplicative parameters to be marginal- 
ized over (as in Dodelson & Knox 2000) rather than 
as correlated noise (our approximation is accurate as 
long as the relative calibration errors are much less 
than unity). 

• Step 3 should ideally use the exact band power win- 
dow functions. Unfortunately, most window func- 
tions available in the literature are variance window 
functions, and using them as band-power window 
functions is an approximation which is not always 
good. Experimentalists are strongly encouraged to 
publish their band power window functions! 

• The overall accuracy of our technique can be im- 
proved with brute force, by computing a finer grid 
of models in step 1. Indeed, the errors introduced 
in Step 2 can in principle be continuously reduced 
toward zero by refining the (w c( j m , Wb, w !/ )-grid for 
low £ and shifting the splicing point upwards from 
£~100. 

• The accuracy in Step 4 can be improved by inte- 
grating instead of maximizing when marginalizing. 
This will make a difference mainly early on when the 
10-dimensional probability distribution in parameter 
space is widely extended and differs greatly from a 
multivariate Gaussian distribution. If this integra- 
tion approach is used, it should be applied even for 
the normalizations A s and A t , for consistency. 

A second general area of improvement will be to include 
more prior information than Hubble parameter measure- 
ments and nucleosynthesis constraints. As data improves 
in a wide variety of areas, this will not only help break 
parameter degeneracies, but also allow important cross- 
checks. A very large number of such multi-dataset studies 
have been carried out in the past (Bahcall et al. 1999 and 
Bridle et al. 1999 provide good recent entry points into 
the literature), but rarely for more than a few parameters 
at a time. Here is a necessarily incomplete list of such 
constraints: 

• Measurements of the matter power spectrum and its 
time-evolution P(k, z) from galaxy redshift surveys. 

• Measurements of P(k, z) from weak gravitational 
lensing (e.g., Narayan & Bartelmann 1996) 

• Measurements of P(k, z) from the abundance of 
galaxy clusters (e.g., Carlberg 1997; Bahcall & Fan 
1998; Eke et al. 1998.) 

• Constraints om P(k) from peculiar velocity measure- 
ments (e.g., Zehavi & Dekel 1999). 

• Limits on (h, 0,^,^1^) from SN la. 

• Limits on (h, f2k, £Ia) from lens statistics (e.g., Kochanek 
1996; Falco et al. 1998; Bartelmann et al. 1998; Hel- 
big 1999) 
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• Limits on {h, fi^, ^a) from limits on the age of the 
Universe and various other classical cosmological 
tests (Peebles 1993). For instance SZ cluster distance 
measurements provide promising new constraints of 
this type (Reese et al. 2000). 

• Direct measurements of f2 m and the baryon fraction 
^b/^m from cluster studies (Carlberg et al. 1999; 
White et al. 1993; Danos k, Pen 1998; Cooray 1998) 

Finally, adding more physics can both weaken and 
tighten constraints. Adding further parameters (say an 
equation of state for a scalar field component) can weaken 
constraints on other semi-degenerate parameters. On the 
other hand, adding an astrophysical model for, say, how r 
depends on the other parameters can substantially tighten 
constraints (Venkatesian 2000). 

In conclusion, as CMB experimentalists continue to 
forge ahead, CMB theorists will need to work hard to keep 
up. 
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tion, and Hubble Fellowships HF-01084.01-96A and HF- 
01116.01-98A from STScI, operated by AURA, Inc. under 
NASA contract NAS5-26555. 



APPENDIX A 
CONDITIONAL MARGIN ALIZATION 

In this Appendix, we show that maximizing is equiv- 
alent to integrating when marginalizing multidimensional 
Gaussians in arbitrary dimensions. Although this useful 
property is undoubtedly derived in the statistics literature, 
we present a brief derivation here for completeness. 

A multivariate Gaussian distribution in n dimensions 
takes the form 



/(x) = (2tt|C|)- 



i/2 f 



(x-x) t C _1 (x-x) 



(Al) 



where x is the mean vector and C is the n x n covariance 
matrix. Let us partition the n parameters in x into two 



subsets y and z of size n x and n y (n x 



(D 



D 

E' 



E 
F 



n) and write 



(A2) 



We can now define a probability distribution for y in two 
different ways, by either integrating or maximizing over z: 



/int(y) = / /(X)<f** 



(A3) 



/max(y) = cmax/(x), 



(A4) 



where the normalization constant c is chosen so that / max 
integrates to unity. Maximizing / is equivalent to mini- 
mizing the quadratic form (x — x)'C _1 (x — x). Inserting 



equation (A2) and differentiating with respect to z shows 
that this minimum is attained for 



= z-F- 1 E'(y-y). 



Substituting this back into equation (A4) gives 
/ ma x(y)oce-iCy-y) t [D- E F E 1 (y -y) ) 



(A5) 



(A6) 



i.e., a Gaussian with mean y and covariance matrix [D — 
EFE'] -1 . As is well known, integrating over z also gives 
a Gaussian with mean y and a covariance matrix which 
is simply the upper left submatrix of the full covariance 
matrix C. The identity 



D 

E' 



E 
F 



(A7) 



[D-EF^E*]- 1 
-P- 1 E*fD - EF^E'l 



-D _1 E[F - ED" 1 ^] 
[F - ED _1 E*1 _1 



therefore shows that the covariance matrix is [D— EF _1 E i ] _1 , 
i.e., the same as for the maximization case. This proves 
that /; nt = /maxj that the two methods of marginal- 
ization give identical results when the probability distri- 
bution is Gaussian. The identity given by equation (A7) is 



readily proven by simply multiplying the matrices on the 
left and right hand sides together and verifying that their 
product is the identity matrix. 
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